Gumbel based p-value approximations for spatial scan statistics
نویسندگان
چکیده
BACKGROUND The spatial and space-time scan statistics are commonly applied for the detection of geographical disease clusters. Monte Carlo hypothesis testing is typically used to test whether the geographical clusters are statistically significant as there is no known way to calculate the null distribution analytically. In Monte Carlo hypothesis testing, simulated random data are generated multiple times under the null hypothesis, and the p-value is r/(R + 1), where R is the number of simulated random replicates of the data and r is the rank of the test statistic from the real data compared to the same test statistics calculated from each of the random data sets. A drawback to this powerful technique is that each additional digit of p-value precision requires ten times as many replicated datasets, and the additional processing can lead to excessive run times. RESULTS We propose a new method for obtaining more precise p-values with a given number of replicates. The collection of test statistics from the random replicates is used to estimate the true distribution of the test statistic under the null hypothesis by fitting a continuous distribution to these observations. The choice of distribution is critical, and for the spatial and space-time scan statistics, the extreme value Gumbel distribution performs very well while the gamma, normal and lognormal distributions perform poorly. From the fitted Gumbel distribution, we show that it is possible to estimate the analytical p-value with great precision even when the test statistic is far out in the tail beyond any of the test statistics observed in the simulated replicates. In addition, Gumbel-based rejection probabilities have smaller variability than Monte Carlo-based rejection probabilities, suggesting that the proposed approach may result in greater power than the true Monte Carlo hypothesis test for a given number of replicates. CONCLUSIONS For large data sets, it is often advantageous to replace computer intensive Monte Carlo hypothesis testing with this new method of fitting a Gumbel distribution to random data sets generated under the null, in order to reduce computation time and obtain much more precise p-values and slightly higher statistical power.
منابع مشابه
On the distribution of linear combinations of independent Gumbel random variables
The distribution of linear combinations of independent Gumbel random variables is of great interest for modeling risk and extremes in the most different areas of application. In this paper we develop near-exact approximations for the distribution of linear combination of independent Gumbel random variables based on a shifted generalized near-integer gamma distribution and on the distribution of...
متن کاملImportance sampling for spatial scan analysis: computing scan statistic p-values for marked point processes(
Each point in an observed point pattern representing potential target detections (e.g., mines for mine eld detection and localization) often is accompanied by a scalar ‘mark’ representing the detector’s level of con dence in that particular detection. Scan analysis for clustering should take this additional mark information into account. We present an importance sampling method for deciding, ba...
متن کاملSaddlepoint Approximations and Nonlinear Boundary Crossing Probabilities of Markov Random Walks by Hock
Saddlepoint approximations are developed for Markov random walks Sn and are used to evaluate the probability that (j − i)g((Sj − Si)/(j − i)) exceeds a threshold value for certain sets of (i, j). The special case g(x) = x reduces to the usual scan statistic in change-point detection problems, and many generalized likelihood ratio detection schemes are also of this form with suitably chosen g. W...
متن کاملGlobal fluctuations and Gumbel statistics.
We explain how the statistics of global observables in correlated systems can be related to extreme value problems and to Gumbel statistics. This relationship then naturally leads to the emergence of the generalized Gumbel distribution Ga(x), with a real index a, in the study of global fluctuations. To illustrate these findings, we introduce an exactly solvable nonequilibrium model describing a...
متن کاملOn Concomitants of Order Statistics from Farlie-Gumbel-Morgenstern Bivariate Lomax Distribution and its Application in Estimation
‎In this paper‎, ‎we have dealt with the distribution theory of concomitants of order statistics arising from Farlie-Gumbel-Morgenstern bivariate Lomax distribution‎. ‎We have discussed the estimation of the parameters associated with the distribution of the variable Y of primary interest‎, ‎based on the ranked set sample defined by ordering the marginal observations...
متن کامل